I have chosed the map sector of the dynamically developing area in the UAE. For displaying the area I have used the package “ggmap” and the coordinates of this area in dubai_abu-dhabi.osm.
dubai_gmap <- get_map(location=c(lon = 55.2708, lat = 25.2048),
source = "google", maptype = "hybrid", zoom = 8)
ggmap(dubai_gmap, extent = "normal")
osmmap <- get_map(location = c(53.5800,23.7350,56.8870,26.5390), source = "osm")
ggmap(osmmap, extent = "normal")
The reader can see some examples of use the ggmap package besides just displaying the maps.
gc01 <- geocode("Jumerah Gardens", output = "more")
formattable(data.frame(gc01))
gc03 <- geocode("Dubai International Airport", output = "more")
formattable(data.frame(gc03))
formattable(data.frame(mapdist("dubai", "abu-dhabi")))
formattable(data.frame(mapdist("Jumerah Gardens", "Dubai International Airport")))
var_ways <- route('The Dubai Moll', 'Business Bay', alternatives = TRUE)
formattable(head(data.frame(var_ways)))
ggplot(data = var_ways) + coord_map() +
geom_leg(aes(x = startLon, xend = endLon, y = startLat, yend = endLat, color = route))
qmap(location=c(55.2820, 25.1900), zoom = 15, maptype = 'roadmap',
base_layer = ggplot(aes(x = startLon, y = startLat), data = var_ways)) +
geom_leg(aes(x = startLon, xend = endLon,
y = startLat, yend = endLat, color = route),
alpha = 0.5, size = 2, data = var_ways) +
labs(x = 'Longitude', y = 'Latitude', colour = 'Route') +
facet_wrap(~ route, ncol = 3) + theme(legend.position = 'top')
way_map <- get_map(location = c(55.2820, 25.1900),
source = "google", zoom = 15, maptype = "hybrid")
ggmap(way_map) + geom_leg(data = var_ways,
aes(x = startLon, xend = endLon,
y = startLat, yend = endLat, color = route),
alpha = 0.7, size = 2)
There are several ways to extract geodata. One of them is to do this with this R code cells.
This set of commands allows us to upload the data using the coordinates.
src <- osmsource_api()
bigbox <- center_bbox(55.2708, 25.2048, 6000, 6000)
bdubai <- get_osm(bigbox, source = src)
str(bdubai)
List of 3
$ nodes :List of 2
..$ attrs:'data.frame': 47528 obs. of 9 variables:
.. ..$ id : num [1:47528] 30593914 30593915 31473923 31474006 31474005 ...
.. ..$ visible : Factor w/ 1 level "true": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ timestamp: POSIXlt[1:47528], format: "2016-08-19 09:40:14" "2010-12-14 12:40:14" "2010-12-02 12:23:45" ...
.. ..$ version : num [1:47528] 19 4 2 5 5 5 2 5 2 2 ...
.. ..$ changeset: num [1:47528] 41552017 6657884 6514101 7313392 7313392 ...
.. ..$ user : Factor w/ 203 levels "08xavstj","12Katniss",..: 60 172 172 182 182 182 172 172 172 77 ...
.. ..$ uid : Factor w/ 203 levels "1069176","10927",..: 60 2 2 41 41 41 2 2 2 81 ...
.. ..$ lat : num [1:47528] 25.2 25.2 25.2 25.2 25.2 ...
.. ..$ lon : num [1:47528] 55.3 55.3 55.3 55.3 55.3 ...
..$ tags :'data.frame': 1956 obs. of 3 variables:
.. ..$ id: num [1:1956] 9.11e+07 9.50e+07 9.50e+07 2.60e+08 2.81e+08 ...
.. ..$ k : Factor w/ 104 levels "access","addr:city",..: 36 36 72 12 86 50 41 50 51 53 ...
.. ..$ v : Factor w/ 738 levels "-1","+18006437560",..: 659 486 63 349 453 469 295 150 720 150 ...
..- attr(*, "class")= chr [1:3] "nodes" "osmar_element" "list"
$ ways :List of 3
..$ attrs:'data.frame': 6547 obs. of 7 variables:
.. ..$ id : num [1:6547] 4.86e+06 1.04e+08 1.04e+08 1.04e+08 1.06e+07 ...
.. ..$ visible : Factor w/ 1 level "true": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ timestamp: POSIXlt[1:6547], format: "2014-05-05 11:47:38" "2011-03-12 17:37:11" "2011-03-12 17:37:11" ...
.. ..$ version : num [1:6547] 9 1 1 2 10 7 5 8 6 4 ...
.. ..$ changeset: num [1:6547] 22145147 7535955 7535955 35985485 16483397 ...
.. ..$ user : Factor w/ 125 levels "12Katniss","13 digits",..: 29 111 111 5 28 111 28 78 28 104 ...
.. ..$ uid : Factor w/ 125 levels "1069176","10927",..: 123 30 30 66 39 30 39 40 39 2 ...
..$ tags :'data.frame': 10072 obs. of 3 variables:
.. ..$ id: num [1:10072] 4.86e+06 4.86e+06 4.86e+06 1.04e+08 1.04e+08 ...
.. ..$ k : Factor w/ 135 levels "access","access:note",..: 58 74 96 58 58 25 79 58 68 74 ...
.. ..$ v : Factor w/ 858 levels "-1","-2","+971 4 323 0000",..: 625 257 814 703 703 814 798 624 1 26 ...
..$ refs :'data.frame': 56522 obs. of 2 variables:
.. ..$ id : num [1:56522] 4.86e+06 4.86e+06 4.86e+06 4.86e+06 1.04e+08 ...
.. ..$ ref: num [1:56522] 9.10e+07 2.84e+09 2.84e+09 9.10e+07 9.39e+07 ...
..- attr(*, "class")= chr [1:3] "ways" "osmar_element" "list"
$ relations:List of 3
..$ attrs:'data.frame': 52 obs. of 7 variables:
.. ..$ id : num [1:52] 2757400 2757402 2757403 1320963 1320964 ...
.. ..$ visible : Factor w/ 1 level "true": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ timestamp: POSIXlt[1:52], format: "2013-02-13 16:02:48" "2013-02-13 16:02:48" "2013-02-13 16:02:48" ...
.. ..$ version : num [1:52] 1 1 1 1 1 2 1 1 1 1 ...
.. ..$ changeset: num [1:52] 15019545 15019545 15019545 6657884 6657884 ...
.. ..$ user : Factor w/ 16 levels "4b696d","Alex111X",..: 15 15 15 13 13 15 15 10 10 10 ...
.. ..$ uid : Factor w/ 16 levels "10927","114220",..: 5 5 5 1 1 5 5 6 6 6 ...
..$ tags :'data.frame': 287 obs. of 3 variables:
.. ..$ id: num [1:287] 2757400 2757400 2757402 2757402 2757403 ...
.. ..$ k : Factor w/ 184 levels "alt_name:af",..: 175 180 175 180 175 180 175 180 175 180 ...
.. ..$ v : Factor w/ 167 levels "-1","#CC0000",..: 83 102 83 102 83 102 83 102 83 102 ...
..$ refs :'data.frame': 1523 obs. of 4 variables:
.. ..$ id : num [1:1523] 2757400 2757400 2757400 2757402 2757402 ...
.. ..$ type: Factor w/ 2 levels "node","way": 2 2 1 2 2 1 2 2 1 2 ...
.. ..$ ref : num [1:1523] 2.05e+08 2.05e+08 2.15e+09 2.05e+08 2.05e+08 ...
.. ..$ role: Factor w/ 11 levels "","cable","from",..: 3 10 11 3 10 11 3 10 11 3 ...
..- attr(*, "class")= chr [1:3] "relations" "osmar_element" "list"
- attr(*, "class")= chr [1:2] "osmar" "list"
Node tags:
node_tags <- sort(unique(bdubai$nodes$tags$k))
print(node_tags)
[1] access addr:city addr:country
[4] addr:flats addr:housename addr:housenumber
[7] addr:place addr:postcode addr:street
[10] addr:subdistrict aeroway amenity
[13] barrier bench bicycle
[16] building bus capacity
[19] construction contact:instagram country
[22] covered crossing cuisine
[25] delivery description diplomatic
[28] direction drive_in drive_through
[31] ele email entrance
[34] fee foot highway
[37] horse indoor_seating internet_access
[40] internet_access:fee is_in layer
[43] leisure level levels
[46] lit man_made maxspeed
[49] motor_vehicle name name:ar
[52] name:de name:en name:fr
[55] name:ko name:pl name:ru
[58] natural note office
[61] opening_hours operator outdoor_seating
[64] parking payment:bitcoin phone
[67] place platforms power
[70] public_transport railway ref
[73] religion seamark:beacon_lateral:category seamark:beacon_lateral:colour
[76] seamark:beacon_lateral:system seamark:information seamark:light:character
[79] seamark:light:colour seamark:light:group seamark:light:period
[82] seamark:light:reference seamark:name seamark:type
[85] shelter shop shower
[88] smoking source sport
[91] station subway supervised
[94] surface surveillance surveillance:type
[97] surveillance:zone takeaway tourism
[100] traffic_calming type website
[103] wheelchair wikipedia
104 Levels: access addr:city addr:country addr:flats addr:housename addr:housenumber ... wikipedia
Way tags:
way_tags <- sort(unique(bdubai$ways$tags$k))
print(way_tags)
[1] access access:note addr:city addr:country
[5] addr:housename addr:housenumber addr:postcode addr:street
[9] addr:suburb admin_level aerialway aeroway
[13] alt_name alt_name:hu alt_name2 alt_old_name:hu
[17] amenity area atm barrier
[21] bicycle boundary bridge bridge:structure
[25] building building:height building:levels building:material
[29] building:part bus cables capacity
[33] construction contact:email contact:facebook contact:fax
[37] contact:google_plus contact:instagram contact:phone contact:twitter
[41] contact:website covered created_by crossing
[45] cutting description destination destination:lanes
[49] ele email escalator fee
[53] fence_type foot footway frequency
[57] height highway highway_1 horse
[61] hotel indoor internet_access is_in
[65] junction landuse lanes layer
[69] leisure level lit loc_name
[73] man_made maxspeed maxspeed:hgv maxstay
[77] mooring motor_vehicle name name:ar
[81] name:en name:et name:he name:hu
[85] name:ko name:loc name:ru name:sl
[89] name:uk name:zh natural note
[93] office old_name old_name:hu oneway
[97] opening_hours operator park_ride parking
[101] phone place power public_transport:version
[105] railway ref religion roof:material
[109] roof:shape room rooms service
[113] shop sloped_curb smoking source
[117] sport stars start_date surface
[121] tactile_paving toll tourism tracktype
[125] tunnel turn:lanes voltage water
[129] waterway website wheelchair wheelchair:description
[133] wikidata wikipedia wires
135 Levels: access access:note addr:city addr:country addr:housename addr:housenumber ... wires
Users:
users <- sort(unique(bdubai$nodes$attrs$user))
print(head(users, 12))
[1] 08xavstj 12Katniss 13 digits Abdulaziz AlSweda
[5] acltpe Ahamed Zulfan ahmed abdo edries nasur Ahmed Arafa40
[9] Akos Vancza Alex111X alwasam6 amanza
203 Levels: 08xavstj 12Katniss 13 digits Abdulaziz AlSweda acltpe ... نبيل الغسيني
plot(bdubai)
ts <- find(bdubai, node(tags(v == "traffic_signals")))
ts_dubai <- subset(bdubai, node_ids = ts)
bs <- find(bdubai, node(tags(v %agrep% "busstop")))
bs_dubai <- subset(bdubai, node_ids = bs)
hw <- find(bdubai, way(tags(k == "highway")))
hw <- find_down(bdubai, way(hw))
hw_dubai <- subset(bdubai, ids = hw)
tu <- find(bdubai, way(tags(k == "tunnel")))
tu <- find_down(bdubai, way(tu))
tu_dubai <- subset(bdubai, ids = tu)
plot_ways(hw_dubai, col = "steelblue")
plot_ways(tu_dubai, add = TRUE, col = "magenta")
plot_nodes(ts_dubai, add = TRUE, col = "red")
plot_nodes(bs_dubai, add = TRUE, col = "blue")
bg <- find(bdubai, way(tags(k == "building")))
bg <- find_down(bdubai, way(bg))
bg_dubai <- subset(bdubai, ids = bg)
bg_poly <- as_sp(bg_dubai, "polygons")
spplot(bg_poly, col.regions=brewer.pal(12, "Set3"), c("version"))
# bus <- find(bdubai, relation(tags(v == "bus")))
# bus_dubai <- lapply(bus, function(i) { as_sp(get_osm(relation(i), full = TRUE), "lines") })
bs_points <- as_sp(bs_dubai, "points")
hw_line <- as_sp(hw_dubai, "lines")
# for ( i in seq(along = bus_dubai) ) { plot(bus[[i]], add = TRUE, col = "blue") }
plot(bg_poly, col = "lightsteelblue")
plot(hw_line, add = TRUE, col = "blue")
plot(bs_points, add = TRUE, col = "red")
dad_box <- get_bbox(c(55.2608, 25.1948, 55.2808, 25.2148))
dad_buildings <- extract_osm_objects(key='building', bbox=dad_box)
dad_map <- osm_basemap(bbox = dad_box, bg = 'lightsteelblue')
dad_map <- add_osm_objects(dad_map, dad_buildings, col = 'steelblue')
print_osm_map(dad_map)
buildings_dad <- readLines("/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.imposm-geojson/dubai_abu-dhabi_buildings.geojson") %>% paste(collapse = "\n")
leaflet() %>% setView(lng = 55.2708, lat = 25.2048, zoom = 10) %>%
addTiles() %>%
addGeoJSON(buildings_dad, weight = 1, color = "steelblue", fill = FALSE)
Another possible way is extracting data files in many different formats from the website: https://mapzen.com/data/metro-extracts/metro/dubai_abu-dhabi/ . The files dubai_abu-dhabi.osm, dubai_abu-dhabi_buildings.geojson, etc. were downloaded. The data from the format osm of the file were extracted in formats csv and json using specially designed functions in the programming language python.
Size of the downloaded osm, json and csv file.
print(file.size("/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.osm"))
[1] 394382598
print(file.size("/Users/olgabelitskaya/large-repo/dubai_abu-dhabi.osm.json"))
[1] 458155339
print(file.size("/Users/olgabelitskaya/large-repo/nodes.csv"))
[1] 154228820
print(file.size("/Users/olgabelitskaya/large-repo/nodes_tags.csv"))
[1] 3912302
print(file.size("/Users/olgabelitskaya/large-repo/ways.csv"))
[1] 13797779
print(file.size("/Users/olgabelitskaya/large-repo/ways_tags.csv"))
[1] 13383027
print(file.size("/Users/olgabelitskaya/large-repo/ways_nodes.csv"))
[1] 55135540
The displayed lines of code represent the process of recording information of the CSV files to the SQL database.
sqlite <- dbDriver("SQLite")
dubai_abu_dhabi <- dbConnect(sqlite,"dubai_abu_dhabi.sqlite3")
nodes <- read.csv('/Users/olgabelitskaya/large-repo/nodes.csv')
nodes_tags <- read.csv('/Users/olgabelitskaya/large-repo/nodes_tags.csv')
ways <- read.csv('/Users/olgabelitskaya/large-repo/ways.csv')
ways_tags <- read.csv('/Users/olgabelitskaya/large-repo/ways_tags.csv')
ways_nodes <- read.csv('/Users/olgabelitskaya/large-repo/ways_nodes.csv')
dbWriteTable(conn = dubai_abu_dhabi, name = 'nodes', value = nodes, row.names = FALSE)
dbWriteTable(conn = dubai_abu_dhabi, name = 'nodes_tags', value = nodes_tags, row.names = FALSE)
dbWriteTable(conn = dubai_abu_dhabi, name = 'ways', value = ways, row.names = FALSE)
dbWriteTable(conn = dubai_abu_dhabi, name = 'ways_tags', value = ways_tags, row.names = FALSE)
dbWriteTable(conn = dubai_abu_dhabi, name = 'ways_nodes', value = ways_nodes, row.names = FALSE)
With the help of simple manipulations in the database, the user can perform a selection of interesting information.
The examples of nodes and ways:
formattable(sqldf("select * from nodes limit 3", dbname = "dubai_abu_dhabi"))
formattable(sqldf("select * from ways limit 3", dbname = "dubai_abu_dhabi"))
We can find the number of nodes and ways as well.
formattable(sqldf("SELECT COUNT(*) FROM nodes;"))
formattable(sqldf("SELECT COUNT(*) FROM ways;"))
The number of users:
formattable(sqldf("SELECT COUNT(DISTINCT(e.uid)) FROM \
(SELECT uid FROM nodes UNION ALL SELECT uid FROM ways) e;"))
The database allows to evaluate the contribution of each individual user in map editing.
Let us list the three most active editors of this map section:
formattable(sqldf("SELECT e.user, COUNT(*) as num \
FROM (SELECT user FROM nodes UNION ALL SELECT user FROM ways) e \
GROUP BY e.user \
ORDER BY num DESC \
LIMIT 3;"))
A list of the 3 most common types of places:
formattable(sqldf("SELECT value, COUNT(*) as num \
FROM nodes_tags \
WHERE key='place' \
GROUP BY value \
ORDER BY num DESC \
LIMIT 3;"))
A list of the 10 most common types of buildings:
formattable(sqldf("SELECT value, COUNT(*) as num \
FROM nodes_tags \
WHERE key='building' \
GROUP BY value \
ORDER BY num DESC \
LIMIT 10;"))
A list of the 20 most common streets:
formattable(sqldf("SELECT value, COUNT(*) as num \
FROM nodes_tags \
WHERE key='street' \
GROUP BY value \
ORDER BY num DESC \
LIMIT 20;"))
dbDisconnect(dubai_abu_dhabi)
[1] TRUE
With very similar manipulations we can import the data from JSON files into MongoDB.
Start with running ‘mongod’ from the terminal, end with ‘Ctrl+C’.
m <- mongo("openstreetmap", verbose = FALSE)
# stream_in(file("dubai_abu-dhabi.osm.json"), handler = function(df){m$insert(df)})
The number of documents:
m$count()
[1] 2124505
The three most active editors of this map section:
top_users <- m$aggregate('[
{ "$group" : { "_id" : "$created.user", "count" : { "$sum" : 1} } },
{ "$sort" : { "count" : -1} }, { "$limit" : 3 }
]')
formattable(top_users)
The number of users with one note and the list of 10 users with only one note:
number_oonu <- m$aggregate('[
{ "$group" : { "_id" : "$created.user", "count" : { "$sum" : 1} } },
{ "$sort" : { "count" : 1} }, { "$limit" : 10 }
]')
formattable(number_oonu)
ten_oonu <- m$aggregate('[
{ "$group" : { "_id" : "$created.user", "count" : { "$sum" : 1} } },
{ "$sort" : { "count" : 1} }, { "$limit" : 10 }
]')
formattable(ten_oonu)
A list of 3 most common places:
places <- m$aggregate('[
{ "$match" : { "address.place" : { "$exists" : 1} } },
{ "$group" : { "_id" : "$address.place", "count" : { "$sum" : 1} } },
{ "$sort" : { "count" : -1}}, {"$limit":3}
]')
formattable(places)
A list of 10 most common types of buildings:
buildings <- m$aggregate('[
{ "$match": { "building": { "$exists": 1}}},
{ "$group": { "_id": "$building", "count": { "$sum": 1}}},
{ "$sort": { "count": -1}}, {"$limit": 10}
]')
formattable(buildings)
A list of 10 most common facilities:
facilities <- m$aggregate('[
{ "$match": { "amenity": { "$exists": 1}}},
{ "$group": { "_id": "$amenity", "count": { "$sum": 1}}},
{ "$sort": { "count": -1}}, { "$limit": 10}
]')
formattable(facilities)
A list of 3 most common zipcodes:
postcodes <- m$aggregate('[
{ "$match" : { "address.postcode" : { "$exists" : 1} } },
{ "$group" : { "_id" : "$address.postcode", "count" : { "$sum" : 1} } },
{ "$sort" : { "count" : -1}}, {"$limit": 3}
]')
formattable(postcodes)
Counting zipcodes with one document:
postcodes_od <- m$aggregate('[
{ "$group" : {"_id" : "$address.postcode", "count" : { "$sum" : 1} } },
{ "$group" : {"_id" : "$count", "count": { "$sum" : 1} } },
{ "$sort" : {"_id" : 1} }, { "$limit" : 1}
]')
formattable(postcodes_od)
Examples of statistics indicators for this dataset:
m$info()$stats$ns
[1] "test.openstreetmap"
m$info()$stats$size
[1] 502488587
m$info()$stats$avgObjSize
[1] 236
m$info()$stats$storageSize
[1] 155140096
One of the main problems of public maps - no duplication of all place names in other languages. If it were possible to automate the translation process by increasing a common database of map names in many languages, it would save users from many difficulties and mistakes.
The next problem - the presence of a large number of databases (including mapping) on the same map objects. Some intergraph procedures of already available data would relieve a lot of people from unnecessary work, save time and effort.
Obviously, the information about the number of buildings and their purpose is incomplete. Completeness of public maps can be increased by bringing in the process of mapping new users. For this goal enter the information should be as simple as possible: for example, a choice of the available options with automatic filling many fields for linked options (for example, linking the name of the street and the administrative area in which it is located).
There are a number of mistakes and typos as in every public data. For correction them well-known methods can be proposed: automatic comparison with existing data and verification for new data by other users.
The lack of a uniform postal code system in this concrete dataset complicates their identification and verification.
nodes - points in space with basic characteristics (lat, long, id, tags);
ways - defining linear features and area boundaries (an ordered list of nodes);
relations - tags and also an ordered list of nodes, ways and/or relations as members which is used to define logical or geographic relationships between other elements.
With the help of a specific set of commands we can perform a statistical description of the data collections and the database.
I think this project is educational for me. I believe that one of the main tasks in this case was to study the methods of extraction and researching of map data in open access. For example, I used a systematic sample of elements from the original .osm file for trying functions of processing before applying them to the whole dataset. As a result I have some new useful skills in parsing, processing, storing, aggregating and applying the data.
In the research I have read through quite a lot of projects of other students on this topic. After my own research and review the results of other authors I have formed a definite opinion about the ideas in OpenStreetMap.
This website can be viewed as a testing ground of interaction of a large number of people (ncluding non-professionals) to create a unified information space. The prospects of such cooperation can not be overemphasized. The success of the project will allow to implement the ambitious plans in the field of available information technologies, the creation of virtual reality and many other areas.
Increasing of the number of users leads to many positive effects in this kind of projects:
a rapid improvement in the accuracy, completeness and timeliness of information;
approximation of the information space to the reality , the objectivity of the data evaluation;
reduce the effort for data cleansing on erroneous details.
Ideas for improving the project OpenStreetMap are simple and natural.
Increasing the number of users can be achieved by additional options like marks of the rating evaluation (eg, the best restaurant or the most convenient parking).
The popularity of the project may be more due to the temporary pop-up messages of users (placement is not more than 1-3 hours) with actual information about the geographic location (eg, the presence of traffic jams).
https://wiki.openstreetmap.org/wiki/OSM_XML https://www.datacamp.com/community/tutorials/r-data-import-tutorial#gs.jUE2UHw http://www2.uaem.mx/r-mirror/web/packages/osmar/osmar.pdf https://www.researchgate.net/publication/274740645_Harnessing_open_street_map_data_with_R_and_QGIS https://cran.r-project.org/web/packages/mongolite/vignettes/intro.html https://journal.r-project.org/archive/2013-1/eugster-schlesinger.pdf http://www.joyofdata.de/blog/mongodb-state-of-the-r-rmongodb/ https://edzer.github.io/sp/ https://cran.r-project.org/web/packages/ggmap/ggmap.pdf https://media.readthedocs.org/pdf/jupyter-notebook/latest/jupyter-notebook.pdf https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf https://www.r-bloggers.com/r-and-mongodb/ https://cran.r-project.org/web/packages/mongolite/mongolite.pdf https://www.r-bloggers.com/r-and-sqlite-part-1/ https://www.datacamp.com/community/tutorials/importing-data-r-part-two#gs._PEI6iY https://cran.r-project.org/web/packages/rio/vignettes/rio.html http://flovv.github.io/Gas_price-Mapping/